Protein Family Databases for Automated Protein Domain Identification

نویسنده

ERIK L.L. SONNHAMMER

چکیده

Automatic identification and annotation of protein domains is a major challenge for genome sequencing projects. Simple transfer of the annotation from the overall most similar protein with a known function is relatively reliable for prokaryotic proteins, but often produces misleading and incomplete results for multi-domain proteins, which are common in higher organisms. An alternative approach is to classify protein domains based on matches to a precompiled database of protein domain families. A number of such databases are reviewed here, including an update on the Pfam database. The differences a user can expect to experience when using different databases for domain identification are illustrated by examples of known multi-domain proteins. The advantages and drawbacks of single-sequence versus multiple-alignment methods are also discussed. The degree of protein modularity was surveyed in the genomes of Caenorhabditis elegans, Saccharomyces cerevisiae, and Haemophilus influenzae by matching them to Pfam. While prokaryotic genomes typically have a small fraction of multi-domain proteins, that rarely contain more than three domains, at least 10% of higher eukaryotic proteins have multiple domains, many times with dozens of domains per protein chain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Protein Information Resource

The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283 000 sequences covering the entire taxonomic range. Family classification is used for sensitive identification, consistent annotati...

متن کامل

ADDA: a domain database with global coverage of the protein universe

We used the Automatic Domain Decomposition Algorithm (ADDA) to generate a database of protein domain families with complete coverage of all protein sequences. Sequences are split into domains and domains are grouped into protein domain families in a completely automated process. The current database contains domains for more than 1.5 million sequences in more than 40,000 domain families. In par...

متن کامل

Designing a new tetrapeptide to inhibit the BIR3 domain of the XIAP protein via molecular dynamics simulations

The XIAP protein is a member of apoptosis proteins family. The XIAP protein plays a central role in the inhibition of apoptosis and consists of three Baculoviral IAP Repeat domains. The BIR3 domain binds directly to the N-terminal of caspase-9 and therefore it inhibits apoptosis. N-terminal tetrapeptide region of SMAC protein can bind to BIR3, inhibit it and subsequently induce apoptosis. In th...

متن کامل

ProClass protein family database

ProClass is a protein family database that organizes non-redundant sequence entries into families defined collectively by PROSITE patterns and PIR superfamilies. By combining global similarities and functional motifs into a single classification scheme, ProClass helps to reveal domain and family relationships and classify multi-domain proteins. The database currently consists of more than 120 0...

متن کامل

Family Classification and Integrative Analysis for Protein Functional Annotation

The high-throughput genome projects have resulted in a rapid accumulation of predicted protein sequences, however, experimentally-verified information on protein function lags far behind. The common approach to inferring function of uncharacterized proteins based on sequence similarity to annotated proteins in sequence databases often results in over-identification, underidentification, or even...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Protein Family Databases for Automated Protein Domain Identification

نویسنده

چکیده

منابع مشابه

The Protein Information Resource

ADDA: a domain database with global coverage of the protein universe

Designing a new tetrapeptide to inhibit the BIR3 domain of the XIAP protein via molecular dynamics simulations

ProClass protein family database

Family Classification and Integrative Analysis for Protein Functional Annotation

عنوان ژورنال:

اشتراک گذاری